On High Dimensional Skylines
نویسندگان
چکیده
In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.
منابع مشابه
Mining Thick Skylines over Large Databases
People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3, 15, 8, 13, 2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, whi...
متن کاملK-Dominance in Multidimensional Data: Theory and Applications
We study the problem of k-dominance in a set of d-dimensional vectors, prove bounds on the number of maxima (skyline vectors), under both worst-case and average-case models, perform experimental evaluation using synthetic and real-world data, and explore an application of kdominant skyline for extracting a small set of top-ranked vectors in high dimensions where the full skylines can be unmanag...
متن کاملSkyDB: Skyline Aware Query Evaluation Framework
In recent years much attention has been focused on evaluating skylines, however the existing techniques primarily focus on skyline algorithms over single sets. These techniques face two serious limitations, namely (1) they define skylines to work on a single set only, and (2), they treat skylines as an “add-on”, loosely integrated on top of the query plan. In this work, we investigate the evalu...
متن کاملDiscovering Skylines of Subgroup Sets
Many tasks in exploratory data mining aim to discover the top-k results with respect to a certain interestingness measure. Unfortunately, in practice top-k solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find diverse sets of high-quality patterns. However, a ‘perfect’ diverse top-k cannot possibly exist...
متن کاملCatching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces
The skyline operator is important for multicriteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an object in the skyline? Practically, users may also be interested in the skylines in any subspaces....
متن کامل